NBIS ID: SMS_6198
Report Version: 1.0
Request by: Mona N. Högberg ()
Principal Investigator: Mona N. Högberg ()
Organisation: SLU
NBIS Staff: Juliana Assis ()


1 Setup

2 Version

1.0

  • Support Request Request sent by the user:
    Mona N. Högberg

3 Data

96 samples

  • Type of data

ITS amplicon

  • Data location rackham.uppmax.uu.se /proj/snic2022-22-352

  • Uppmax project ID SNIC 2022/22-352

  • NGI Project ID P9723

  • Database used Unite

4 Tools

NFCore-Ampliseq (Dada2)

#Sample Info
head(sample_info_tab)

5 Workflow

Inspect read quality profiles

Quality profiles of the forward reads:

QC Foward Reads.

QC Foward Reads.

In gray-scale is a heat map of the frequency of each quality score at each base position. The median quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines. The red line shows the scaled proportion of reads that extend to at least that position

The forward reads are good quality I truncated the forward reads at position 240.

Quality profile of the reverse reads:

QC Reverse Reads.

QC Reverse Reads.

The reverse reads are of significantly worse quality, especially at the end, which is common in Illumina sequencing. This isn’t too worrisome, as DADA2 incorporates quality information into its error model which makes the algorithm robust to lower quality sequence, but trimming as the average qualities crash will improve the algorithm’s sensitivity to rare sequence variants. Based on these profiles, I truncated the reverse reads at position 200 where the quality distribution crashes.

Identify primers

The universal primer set 5.8S and ITS1F (Fierer et al., 2005; Yarwood et al., 2010) primers were used to amplify this dataset. We record the DNA sequences, including ambiguous nucleotides, for those primers.

FWD <- “CTTGGTCATTTAGAGGAAGTAA”
REV <- “GCTGCGTTCTTCATCGATGC”

sanity check of primers and adapters

##                  Forward Complement Reverse RevComp
## FWD.ForwardReads       0          0       0       0
## FWD.ReverseReads       0          0       0       0
## REV.ForwardReads       0          0       0       0
## REV.ReverseReads       0          0       0       0

Success! Primers are no longer detected in the cutadapted reads.

The primer-free sequence files are now ready to be analyzed through the DADA2 pipeline.

Learn the Error Rates

Error Rate Foward Reads.

Error Rate Foward Reads.

Error Rate Foward Reads.

Error Rate Foward Reads.

Everything looks reasonable and we proceed with confidence.

QC Foward Reads.

QC Foward Reads.

## null device 
##           1

5.1 Replicates Evaluation

Alpha Diversity Plot comparing the Replicates A and B.

Quality control analysis using matched samples from 3 different Transects: A, B and C of the experiment and replicates samples on each Ecotype. Comparison of alpha diversity in technical replicates samples on all Ecotypes from each Transect. ASV richness and ASV Shannon diversity.

  • Beta Diversity Plot comparing the Replicates A and B.

* Beta Diversity Plot comparing the Replicates A and B.

Rarefaction

5.2 Main Analysis

  • Alpha Diversity Plot - A.

5.3 raw alpha-diversity, i.e. without rarefying

  • comparing alpha diversity based on raw data has the huge problem of ignoring sample size differences. Usually there is a trend that more reads correlate with higher diversity.
  • Here, therefore I checked, whether there are significant sample size differences between the groups.
  • We further look at the residuals of a linear fit alpha diversity vs sample size to correct for this confounder.

5.3.1 Check whether sample_sums/sample sizes/library sizes differ between groups

  • sample size adjustment has no influence on richness, so sample_sizes are compared for raw counts

Alpha Diversity = BoxPlot

#https://grunwaldlab.github.io/analysis_of_microbiome_community_data_in_r/07--diversity_stats.html
anova_result <- aov(df2$Observed ~ Transect * Ecotype, data = df2)
summary(aov(df2$Observed ~ Transect * Ecotype, data = df2))
##                  Df Sum Sq Mean Sq F value        Pr(>F)    
## Transect          2   1715     858   0.733      0.488718    
## Ecotype           4 151580   37895  32.400 0.00000000017 ***
## Transect:Ecotype  8  51828    6478   5.539      0.000243 ***
## Residuals        30  35087    1170                          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aresult <- HSD.test(anova_result, "Ecotype", group = TRUE)

Plot by mean

PY4 <- levels(tt$Ecotype) # get the variables
PY.pairs4 <- combn(seq_along(PY4), 2, simplify = FALSE, FUN = function(i)PY4[i])

##
options(ggrepel.max.overlaps = Inf)
p4 <- tt %>% 
  ggplot(aes(x = Ecotype, y = Observed, color = Ecotype)) + #,shape = Type
  geom_point(size = 5, color = "grey") +
  labs(x = "", y = "") +
  theme_pubr(border = TRUE) +
  scale_colour_manual(values = cols_Ecotype) +
  theme(legend.position="top") +
  theme(legend.title = element_blank()) +
  facet_nested(~Transect) +
  stat_compare_means( method='wilcox.test', p.adjust.method = "BH", label = "p.signif", comparisons = PY.pairs4, size = 2) 

p4 + 
  geom_point(data=p4$data, aes(x = Ecotype, y = Observed, fill =Ecotype, size = 4))  +
  scale_fill_manual(values = cols_Ecotype) 

  • Beta Diversity Plot - A.
gpca  <- ordinate(Plots, "MDS")
# Scree plot
plot_scree(gpca, "Scree Plot for MDS Analysis")

#Remove ASVs that do not show appear more than 5 times in more than half the samples
genefilter = genefilter_sample(P_Plots, filterfun_sample(function(x) x > 5), A=0.5*nsamples(P_Plots))
genefilter_tax = prune_taxa(genefilter, P_Plots)

#Transform to even sampling depth.
genefilter_tax_t = transform_sample_counts(genefilter_tax, function(x) 1E6 * x/sum(x))

genefilter_tax_t.ord <- ordinate(genefilter_tax_t, "NMDS")
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1177618 
## Run 1 stress 0.1177617 
## ... New best solution
## ... Procrustes: rmse 0.0003454533  max resid 0.002010498 
## ... Similar to previous best
## Run 2 stress 0.1161722 
## ... New best solution
## ... Procrustes: rmse 0.04925975  max resid 0.2692136 
## Run 3 stress 0.1363471 
## Run 4 stress 0.1161722 
## ... New best solution
## ... Procrustes: rmse 0.00002537841  max resid 0.00009047779 
## ... Similar to previous best
## Run 5 stress 0.1384023 
## Run 6 stress 0.1339011 
## Run 7 stress 0.1344903 
## Run 8 stress 0.1474127 
## Run 9 stress 0.1345524 
## Run 10 stress 0.1177619 
## Run 11 stress 0.1161722 
## ... New best solution
## ... Procrustes: rmse 0.000003784609  max resid 0.0000169716 
## ... Similar to previous best
## Run 12 stress 0.1500831 
## Run 13 stress 0.1177619 
## Run 14 stress 0.1161722 
## ... New best solution
## ... Procrustes: rmse 0.00001210068  max resid 0.00006047277 
## ... Similar to previous best
## Run 15 stress 0.1151886 
## ... New best solution
## ... Procrustes: rmse 0.01126614  max resid 0.06410793 
## Run 16 stress 0.136347 
## Run 17 stress 0.1350689 
## Run 18 stress 0.1151887 
## ... Procrustes: rmse 0.00005830998  max resid 0.0002856496 
## ... Similar to previous best
## Run 19 stress 0.1177616 
## Run 20 stress 0.1151886 
## ... Procrustes: rmse 0.0000147822  max resid 0.00008000913 
## ... Similar to previous best
## *** Solution reached
#
genefilter_tax_t@sam_data$Transect <- factor(genefilter_tax_t@sam_data$Transect, levels = c(1,2,3), labels = c("1","2", "3"))

genefilter_tax_t@sam_data$Ecotype <- factor(genefilter_tax_t@sam_data$Ecotype, levels = c("Meadow","Alder zone","Spruce-Alder", "Spruce","Pine"), labels = c("Meadow","Alder","Spruce-Alder", "Spruce","Pine"))

#
ord_DataFrame_tx <- plot_ordination(genefilter_tax_t, genefilter_tax_t.ord, "samples", justDF ="TRUE")

#
ordplot_tx <- (ordplot <- plot_ordination(genefilter_tax_t, genefilter_tax_t.ord, "samples", color="Ecotype", shape = "Transect") + 
               geom_hline(yintercept = 0, linetype = 'dashed', alpha = 0.3) + 
               geom_vline(xintercept = 0, linetype = 'dashed', alpha = 0.3) +
               geom_point(size = 5, color = "grey") +
               scale_shape_manual(values=c(23,21,24)) +
               theme_pubr(border = TRUE) +
               coord_fixed(ratio = 1) +
               theme(axis.text=element_text(size=14), 
                     axis.text.x = element_text(size = 12, hjust = 0.5), 
                     axis.title.y = element_text(size = 18),
                     legend.text=element_text(size=14), 
                     legend.title=element_text(size=0),
                     legend.position="bottom",
                     axis.title.x = element_text(size = 18), 
                     strip.text.x = element_text(size = 20, face = "bold"))) +
  scale_color_manual(values = cols_Ecotype)#+
ordplot_tx$layers <- ordplot_tx$layers[-1]
ordplot_tx + 
  geom_point(data=ord_DataFrame_tx, aes(x = NMDS1, y = NMDS2, fill =Ecotype, size = 4))  +
  scale_fill_manual(values = cols_Ecotype) 

pord2 = (ordplot <- plot_ordination(genefilter_tax_t, genefilter_tax_t.ord, "samples", color="Ecotype", shape = "Transect") + 
               geom_hline(yintercept = 0, linetype = 'dashed', alpha = 0.3) + 
               geom_vline(xintercept = 0, linetype = 'dashed', alpha = 0.3) +
               geom_point(size = 5, color = "grey") +
               scale_shape_manual(values=c(23,21,24)) +
               geom_polygon(aes(fill=Ecotype, alpha = 0.8)) + 
               theme_pubr(border = TRUE) +
               coord_fixed(ratio = 1) +
               theme(axis.text=element_text(size=14), 
                     axis.text.x = element_text(size = 12, hjust = 0.5), 
                     axis.title.y = element_text(size = 18),
                     legend.text=element_text(size=14), 
                     legend.title=element_text(size=0),
                     legend.position="bottom",
                     axis.title.x = element_text(size = 18), 
                     strip.text.x = element_text(size = 20, face = "bold"))) +
  scale_color_manual(values = cols_Ecotype)

pord2$layers <- pord2$layers[-1]
pord2 + 
  geom_point(data=ord_DataFrame_tx, aes(x = NMDS1, y = NMDS2, fill =Ecotype, size = 4))  +
  scale_fill_manual(values = cols_Ecotype) 

  • Beta Diversity Plot - B.

Ecotype

  • Permutational analysis of variance

HeatMap

#Heatmap 
meta_data_2 <- read.table("/Users/juliana/Documents/NBIS/Projects/6198/Metadado/metadata_copynumber_sampleA.tsv", header=T, row.names=1, check.names=T, sep="\t")
sample_data(P_Plots) <- meta_data_2

P_Plots@sam_data$CopyNumber2 <- as.numeric(P_Plots@sam_data$CopyNumber)


#Remove ASVs that do not show appear more than 3 times in more than 30% the samples
genefilter2 = genefilter_sample(P_Plots, filterfun_sample(function(x) x > 3), A=0.3*nsamples(P_Plots))
genefilter_tax2 = prune_taxa(genefilter2, P_Plots)
#MetaData
metaR <- meta(genefilter_tax2)

# Reorder Transect
metaR$Transect = factor(metaR$Transect, levels = c("1", "2","3")) #D46C4E
metaR_Transect <- c("#C0C0C0","#BC8F8F","#F5DEB3") #D46C4E #43978D
names(metaR_Transect) <- levels(metaR$Transect)

# Reorder Ecotype
metaR$Ecotype <- factor(metaR$Ecotype, levels = c("Meadow","Alder zone","Spruce-Alder", "Spruce","Pine"), labels = c("Meadow","Alder","Spruce-Alder", "Spruce","Pine"))
metaR_Ecotype <- c("#77A515", "#264D59", "#43978D", "#D46C4E", "#2b8cbe")
names(metaR_Ecotype) <- levels(metaR$Ecotype)

# Add to a list, where names match those in factors dataframe
metaR_AnnColour <- list(
  Transect = metaR_Transect)

metaR_AnnColour2 <- list(
  Ecotype = metaR_Ecotype)

metaR_AnnColourx <- list(
  Transect = metaR_Transect,
  Ecotype = metaR_Ecotype)

# Check the output
metaR_AnnColour
## $Transect
##         1         2         3 
## "#C0C0C0" "#BC8F8F" "#F5DEB3"
SampleOrder = order(metaR$Ecotype, metaR$Transect)
meta.factors <- select(metaR, Transect, Ecotype)


metaR_Filter_composi.filt.abs <- P_Plots
#Subset #Multiplicando pela soma das reads, normalizando
metaR_Filter_composi.filt.ab <- transform_sample_counts(genefilter_tax2, function(x)100*x/sum(x))

###
{
  for(n in 1:nsamples(metaR_Filter_composi.filt.ab))
    
    otu_table(metaR_Filter_composi.filt.abs)<- otu_table(metaR_Filter_composi.filt.ab)*sample_data(metaR_Filter_composi.filt.ab@sam_data)$CopyNumber [n]
}

abs_plot <- data.frame(otu_table(metaR_Filter_composi.filt.abs))

#Colocando ASV names na OTU table
rownames(abs_plot)
##  [1] "ASV1"   "ASV3"   "ASV4"   "ASV5"   "ASV7"   "ASV8"   "ASV11"  "ASV13" 
##  [9] "ASV14"  "ASV18"  "ASV21"  "ASV22"  "ASV25"  "ASV29"  "ASV31"  "ASV34" 
## [17] "ASV37"  "ASV39"  "ASV46"  "ASV48"  "ASV52"  "ASV55"  "ASV56"  "ASV57" 
## [25] "ASV62"  "ASV73"  "ASV82"  "ASV83"  "ASV84"  "ASV85"  "ASV90"  "ASV99" 
## [33] "ASV100" "ASV106" "ASV111" "ASV125" "ASV129" "ASV130" "ASV132" "ASV135"
## [41] "ASV138" "ASV140" "ASV153" "ASV155" "ASV162" "ASV163" "ASV170" "ASV174"
## [49] "ASV176" "ASV187" "ASV197" "ASV202" "ASV206" "ASV208" "ASV210" "ASV212"
## [57] "ASV218" "ASV220" "ASV222" "ASV229" "ASV237" "ASV248" "ASV265" "ASV271"
## [65] "ASV273" "ASV278" "ASV290" "ASV292" "ASV293" "ASV295" "ASV299" "ASV311"
## [73] "ASV316" "ASV319" "ASV341" "ASV347" "ASV349" "ASV352" "ASV362" "ASV376"
## [81] "ASV383" "ASV386" "ASV392" "ASV416" "ASV423" "ASV429" "ASV443" "ASV453"
## [89] "ASV467" "ASV479" "ASV546" "ASV572" "ASV597" "ASV610" "ASV633" "ASV778"
## [97] "ASV779" "ASV787"
taxonomy_otu_compositional_copy <- data.frame(tax_table(metaR_Filter_composi.filt.abs))
tax_m <- taxonomy_otu_compositional_copy$taxa_name
rownames(abs_plot) <- tax_m
abs_plot
##Plot
plot_Log10_max <- log10(abs_plot)/max(log10(abs_plot) +1)
#plot_Log10_max <- log10(abs_plot +1)
plot_Log10_max[plot_Log10_max == "-Inf"] <- 0

colsHeat<- c("#F7F7F7", "#92C5DE", "#0571B0", "#F4A582", "#CA0020")

pheatmap(plot_Log10_max[, SampleOrder],
         cluster_cols = FALSE,
         cluster_rows = TRUE,
         gaps_row = 5, 
         clustering_distance_rows = "euclidean",
         clustering_distance_cols  = "euclidean",
         annotation_colors = metaR_AnnColourx, annotation_col = meta.factors, 
         show_colnames = FALSE,
         color = colorRampPalette(c("white", colsHeat))(50),
         border_color = "#f8edeb",
         display_numbers = FALSE)

meta.factorsT <- select(metaR, Transect)
meta.factorsE <- select(metaR, Ecotype)
Taxa_DataFrame <- data.frame(tax_table(metaR_Filter_composi.filt.abs))
Taxa_DataFrame_F <- Taxa_DataFrame %>% select(1,2,3,4,5,6,10)
Taxa_Matrix <- as.matrix(Taxa_DataFrame_F)

pseq_plot_heat <- metaR_Filter_composi.filt.abs

rown <- rownames(sample_data(pseq_plot_heat))
sample_data(pseq_plot_heat)$SampleID <- rown
sample_data(pseq_plot_heat) <- (sample_data(pseq_plot_heat)[, c(14,1,2,3,4,5,6,7,8,9,10,11,12,13)])

tax_table(pseq_plot_heat) <- as.matrix(Taxa_Matrix)

count_f <- data.frame(otu_table(pseq_plot_heat))
meta_f <- data.frame(sample_data(pseq_plot_heat))
taxa_f <- data.frame(tax_table(pseq_plot_heat))
names(taxa_f)[7] <- "Species"


d <- amp_load(
  otutable = count_f,
  metadata = meta_f,
  taxonomy = taxa_f
)
## Warning: Could not find a column named OTU/ASV in otutable, using rownames as
## OTU ID's
## Warning: Could not find a column named OTU/ASV in taxonomy, using rownames as
## OTU ID's
teste <-  amp_heatmap(
  d,
  group_by = c("Transect"),
  facet_by = c("Ecotype"),
  normalise = TRUE,
  #tax_add = "OTU",
  tax_aggregate = "Species",
  tax_show = 100,
  showRemainingTaxa = FALSE,
  tax_class = NULL,
  tax_empty = "best",
  plot_values = FALSE,
  plot_values_size = 3,
  plot_legendbreaks = NULL,
  plot_colorscale = "log10",
  plot_na = TRUE,
  measure = "mean",
  #sort_by = "Transect 1",
  min_abundance = 0.01,
  max_abundance = NULL,
  normalise_by = NULL,
  scale_by = NULL,
  color_vector = colorRampPalette(c("white", colsHeat))(50),
  round = 1,
  textmap = FALSE,
  plot_functions = FALSE,
  function_data = FALSE,
  functions = c("MiDAS", "Filamentous", "AOB", "NOB", "PAO", "GAO"),
  rel_widths = c(0.75, 0.25)
)
## Warning: The data has already been normalised. Setting normalise = TRUE (the
## default) will normalise the data again and the relative abundance information
## about the original data of which the provided data is a subset will be lost.
## Warning: There are only 98 taxa, showing all
print(teste)
## Warning: Transformation introduced infinite values in discrete y-axis

  • Relative abundance of the top 20 ASVs

6 Summary

Help is needed with running the “nfcore/ampliseq” pipeline developed at NGI for the analyses of fungal ITS1 amplicons, Illumina Miseq analysis NGI project ID P5953 (M.Hogberg_17_01_project summary from 2018-01-19 by Chuan Wang refers to P9723, >=Q30 (mean(SD), 70(2) (%), Sum Reads=15 650 000, Mean reads per sample 171 711, 1 pool of amplicons, 1 flowcell v3, PE 2x301bp (validated method), demultiplexing, quality control and raw data delivery on Uppmax (validated method). Agreement number M.Hogberg_16_01-20160826. Grus delivery project delivery 00654. Because my support application 2019-01-17 was rejected, I have collaborated with partners in US on this matter but all is extremely delayed for known reasons. I got some hope today when I read about the recently developed pipeline for fungal analyses! Unfortunately, I have no programming skills but have a BSc in Molecular Biology.

Short summary of the work.

7 Further Work

Further steps to be taken (if needed).

8 References

Relevant references for methods, tools etc.

9 Deliverables

Files delivered to the user with descriptions.

9.1 Directory

/data/processed/b/

8 directories, 18 files

Total size is XX GB.

10 Timeline

11 Practical Info

11.1 Data responsibility

The responsibility for data archiving lies with the PI of the project. We do not offer long-term storage or retrieval of data.

  • NBIS & Uppnex: We kindly ask that you remove the files from UPPMAX/UPPNEX. The main storage at UPPNEX is optimized for high-speed and parallel access, which makes it expensive and not the right place for longer time archiving. Please consider others by not taking up the expensive space. Please note that UPPMAX is a resource separate from the Bioinformatics Platform, administered by the Swedish National Infrastructure for Computing (SNIC) and SNIC-specifc project rules apply to all projects hosted at UPPMAX.
  • Sensitive data : Please note that special considerations may apply to the human-derived legally considered sensitive personal data. These should be handled according to specific laws and regulations as outlined e.g. here.
  • Long-term backup : We recommend asking your local IT for support with long-term data archiving. Also a newly established Data Office at SciLifeLab may be of help to discuss other options.

11.2 Acknowledgments

If you are presenting the results in a paper, at a workshop or conference, we kindly ask you to acknowledge us.

  • NBIS staff are encouraged to be co-authors when this is merited in accordance to the ethical recommendations for authorship, e.g. ICMJE recommendations. If applicable, please include Juliana, Assis Geraldo, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, NBIS, as co-author. In other cases, NBIS would be grateful if support by us is acknowledged in publications according to this example:

“Support by NBIS (National Bioinformatics Infrastructure Sweden) is gratefully acknowledged.”

“The computations were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Project SNIC 2022-22-352.”

  • NGI : For publications based on data from NGI Sweden, NGI, SciLifeLab and UPPMAX should be acknowledged like so:

“The authors would like to acknowledge support from Science for Life Laboratory (SciLifeLab), the National Genomics Infrastructure (NGI), and Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) for providing assistance in massive parallel sequencing and computational infrastructure.”

12 Support Completion

You should soon be contacted by one of our managers with a request to close down the project in our internal system and for invoicing matters. If we do not hear from you within 30 days the project will be automatically closed and invoice sent. Again, we would like to remind you about data responsibility and acknowledgements, see sections: Data Responsibility and Acknowledgments.

You are welcome to come back to us with further data analysis request at any time via http://nbis.se/support/support.html.

Thank you for using NBIS.